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Ice formation is ubiquitous in nature, with important consequences in a variety of environments, 
including biological cells, soil, aircraft, transportation infrastructure and atmospheric clouds. How¬ 
ever, its intrinsic kinetics and microscopic mechanism are difficult to discern with current exper¬ 
iments. Molecular simulations of ice nucleation are also challenging, and direct rate calculations 
have only been performed for coarse-grained models of wate. For molecular models, only indirect 
estimates have been obtained, e.g. by assuming the validity of classical nucleation theory. We use 
a path sampling approach to perform the first direct rate calculation of homogeneous nucleation of 
ice in a molecular model of water. We use TIP4P/Ice, the most accurate among existing molecular 
models for studying ice polymorphs. By using a novel topological approach to distinguish different 
polymorphs, we are able to identify a freezing mechanism that involves a competition between cubic 
and hexagonal ice in the early stages of nucleation. In this competition, the cubic polymorph takes 
over since the addition of new topological structural motifs consistent with cubic ice leads to the 
formation of more compact crystallites. This is not true for topological hexagonal motifs, which 
give rise to elongated crystallites that are not able to grow. This leads to transition states that 
are rich in cubic ice, and not the thermodynamically stable hexagonal polymorph. This mechanism 
provides a molecular explanation to the earlier experimental and computational observations of the 
preference for cubic ice in the literature. 


Ice nucleation affects the behaviour of many sys¬ 
tems [1-6]. For example, the formation of ice crys¬ 
tals inside the cytoplasm can damage living cells [1]. 
The amount of ice in a cloud determines both its light¬ 
absorbing properties [5] and its precipitation propen¬ 
sity [6], and is therefore an important input parameter 
in many meteorological models [7, 8]. Yet, current ex¬ 
periments are incapable of uncovering the kinetics and 
the molecular mechanism of freezing due to their limited 
spatiotemporal resolution. The ice that nucleates ho¬ 
mogeneously in the atmosphere and vapor chamber ex¬ 
periments is predominantly comprised of the cubic-rich 
stacking-disordered polymorph, not the thermodynam¬ 
ically stable hexagonal polymorph [9, 10]. This obser¬ 
vation has been rationalized invoking the Ostwald step 
rule [11]. However, the molecular origin of this prefer¬ 
ence is unknown due to the limited spatiotemporal reso¬ 
lution of existing experimental techniques. Furthermore, 
experimental measurements of nucleation rates are only 
practical over narrow ranges of temperatures [12], with 
any extrapolation being prone to large uncertainties. 



Size of the Largest Crystallite 

FIG. 1: Cumulative transition probability vs. size 
of the largest crystalline nucleus in the TIP4P/Ice 
system at 230 K and 1 bar. The inflection region is shown 
in shaded purple. Several representative crystallites are also 
depicted. The cumulative probability curve for the LJ system 
simulated at ksTje = 0.82 and pa^je = 5.68 is shown in the 
inset with e and a the LJ energy and size parameters. No 
inflection region is observed in the LJ system. 


Computer simulations are attractive alternatives in 
this quest, as they make it possible to obtain at any 
given thermodynamic condition a statistically represen¬ 
tative sample of nucleation events that can then be used 
to estimate the rates and identify the mechanism of nucle¬ 
ation. This, however, has only been achieved [13-15] for 
coarse-grained representations of water, such as mW [16]. 
For the more realistic molecular force-fields, all the ex¬ 
isting studies have relied either on launching a few /is- 
long molecular dynamics (MD) trajectories [17, 18], or on 
applying external fields [19], or biasing potentials along 


pre-chosen reaction coordinates [20] to drive nucleation, 
and the generation of statistically representative nucle¬ 
ation trajectories that can allow direct and accurate rate 
predictions has so far been beyond reach. 

In this work, we achieve this goal in a system of 4,096 
water molecules at 230 K and 1 bar by introducing a 
novel coarse-graining modification to the path sampling 
method known as forward-flux sampling (FFS) [21]. In 
the FFS approach, the nucleation process is sampled in 
stages defined by an order parameter, A. In crystal- 
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lization studies, A is typically chosen as the size of the 
largest crystalline nucleus in the system [13-15]. Indi¬ 
vidual molecules are labeled as solid- or liquid-like based 
on the Steinhardt order parameters [22], and the neigh¬ 
boring solid-like molecules are connected to form a clus¬ 
ter (For further details, refer to the SI and Fig. SI.). 
The cumulative probability of growing a crystallite with 
A molecules is then computed from the success probabil¬ 
ities at individual stages. If a sufficiently large number 
of trajectories are sampled at each stage, the nucleation 
mechanism can be accurately determined by inspecting 
the ensemble of pseudo-trajectories that connect the liq¬ 
uid and crystalline basins. We use the term ’pseudo- 
trajectory‘ as, during FFS, all velocities are randomized 
at any given milestone. 

In conventional FFS, the underlying MD trajectories 
are monitored as frequently as possible, usually every 
single MD step. In the TIP4P/Ice system, however, this 
approach is unsuccessful as the cumulative growth prob¬ 
ability never converges (plateaus) and instead plummets 
unphysically (Fig. S2a). Because of the five-orders-of- 
magnitude separation between the structural relaxation 
time, Tr (Fig. 2a), and the sampling time, r^, the high- 
frequency fluctuations in \(t) do not reflect physically 
relevant structural transformations. We therefore filter 
such high-frequency fluctuations by computing the or¬ 
der parameter along MD trajectories less frequently. We 
choose Ts = 1 ps, which is still around three orders of 
magnitude smaller than the hydrogen bond relaxation 
time [23] (Fig. 2c). By decreasing the separation between 
Ts and r^, the FFS calculation converges and the cumu¬ 
lative probability eventually plateaus (Fig. 1). The com¬ 
puted nucleation rate is log^^g ^ = 5.9299 ± 0.6538- R in 
m“^'S“^. This implies, statistically, one nucleation event 
per 9 X 10^^ s in the 4,096-molecule system considered 
in this work, that has an average volume of ~ 125 nm^. 
Note the astronomical separation of time scales between 
structural relaxation (r^ = 0.6 ns) and ice nucleation. 
This rate is placed in the context of earlier experimental 
estimates [12, 24] below (see Comparison with Experi¬ 
mental Rate Measurements). We confirm the accuracy of 
the coarse-grained FFS by observing that the computed 
crystallization rates in the Lennard-Jones (LJ) system 
are insensitive to Tq if Tg/r^ < 10“^ (Fig- 3 and 2b). 

For most materials, the probability of adding a certain 
number of molecules to a crystallite of A molecules in¬ 
creases with A. This leads to a consistent positive curva¬ 
ture in the cumulative probability curve e.g. in the crys¬ 
tallization of the LJ system (Fig. 1 inset and Fig. S3a). 
For water, however, the cumulative probability curve has 
a pronounced inflection at A 30, where the proba¬ 
bility of growing an average crystallite decreases signifi¬ 
cantly with A before rebounding again at larger A’s. The 
inflection is accompanied by non-monotonicities in sev¬ 
eral other mechanical observables. For instance, in the 
inflection region, the average density increases with A 


(Fig. 4d), even though there is an overall decrease in 
density upon crystallization. We observe similar non- 
monotonicties in the longest principal axes (Fig. 4a) and 
the asphericity (Fig. 4b) of the largest crystallite, as well 
as the number of five-, six- and seven-member rings in the 
system (Fig. 4c). The non-mo notonicity in ring size dis¬ 
tribution has also been observed in the freezing of ST2, 
another molecular model of water [25]. In the LJ system, 
however, all of these quantities evolve monotonically from 
their averages in the liquid to their averages in the crys¬ 
tal (Fig. 4 insets and Fig. S3). In the coarse-grained mW 
system, this inflection is present, but is very mild, and 
the non-monotonicities are much weaker (Fig. S4). 

In order to understand the origin of this inflection, 
we examine all the configurations in the shaded purple 
regions of Figs. 1 and 4, and identify those that ’sur- 
vive‘ the inflection region by giving rise to a progeny at 
A = 41. Visual inspection of these configurations reveals 
an abundance of double-diamond cages (DDCs) in their 
largest crystallites. DDCs (Fig. 5a) are the basic build¬ 
ing blocks of cubic ice (Jc), and are topologically identical 
to the carbon backbone of the polycyclic alkane diaman- 
tane [26]. The largest crystallites of the Vanishing^ con¬ 
figurations, however, are rich in hexagonal cages (HCs) 
(Fig. 5b), the basic building blocks of hexagonal ice {In)- 
We then use a topological criterion to detect DDCs and 
HCs (See SI). In this approach, all primitive hexagonal 
rings in the nearest neighbor network are identified, and 
DDCs and HCs are detected based on the connectivity 
of the neighboring hexagonal rings (See SI for further 
details.). We identify several isolated cages even in the 
supercooled liquid. Due to their distorted geometries, 
however, such cages can only be detected topologically, 
and not through conventional order parameters such as 
gs [13]. Similar to the crystallites that are clusters of 
neighboring molecules with local solid-like environments 
(see SI), the cages that share molecules can also be clus¬ 
tered together to define interconnected DDC/HC net¬ 
works. With their constituent cages detected topologi¬ 
cally, such networks can contain both solid- and liquid¬ 
like molecules. We observe that almost all the molecules 
of the largest crystallites participate in DDC/HC net¬ 
works. This is consistent with earlier experimental and 
computational observations [10, 27] that the ice that nu¬ 
cleates from supercooled water is a stacking-disordered 
mixture of both Ic and R polymorphs. 

Consistent with our visual observation, a stark dif¬ 
ference exists between the DDC makeup of the surviv¬ 
ing and vanishing configurations. In the surviving con¬ 
figurations, the water molecules of the largest crystal¬ 
lite are more likely to participate in DDCs than in HCs 
(Figs. 5c-d), making the corresponding crystallites more 
cubic than the average. Such cubic-rich configurations 
are scarce at the beginning and only grow in number 
towards the end of the inflection region. Conversely, 
the majority of configurations, which are HC-rich, be- 
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FIG. 2: Structural relaxation in the supercooled liquid. Self-intermediate scattering functions computed from MD 
simulations of (a) the TIP4P/Ice (blue) and the mW (orange) systems at 230 K and 1 bar, and (b) the LJ system at ksTje = 0.82 
and pnct^ = 0.974. In each case, q* is in close correspondence with the hrst peak of S{q), the structure factor, in the 
corresponding system. The structural relaxation time, Tr, is defined as the time at which Fs{q*,t) = 1/e. (c) c(t), the 

hydrogen-bond correlation function, computed in NpT simulations of a system of 216 TIP4P/Ice molecules at 230 K and 1 bar. 
Th is defined as c{Th) = 1/e. 



Sampling Time 

FIG. 3: Effect of Ts, the sampling time, on fluxes, cu¬ 
mulative probabilities and nucleation rates computed 
from a series of FFS calculations conducted for a sys¬ 
tem of 4,096 Lennard-Jones atoms at ksTje = 0.82 and 
pa^je = 5.68. Divergence only occurs when Ts becomes com¬ 
parable to Tr. Gomputed quantities are insensitive to Ts for 
Ts ^ Tr. All quantities are in the LJ dimensionless units. 


come extinct towards the end of the inflection region. 
This preference can be explained by comparing the ge¬ 
ometric features of the HC-rich and DDC-rich crystal¬ 
lites. While the DDC-rich crystallites are comparatively 
uniform in shape (Fig. 5h), the HC-rich crystallites are 


more aspherical (Fig. 5g), and therefore less likely to grow 
and survive the inflection region. This higher aspheric- 
ity arises from the preferential addition of new HCs to 
the prismatic faces of the existing HCs, as evident in 
the abrupt increase in the ratio of prismatic to basal 
HC-HC connections in the inflection region (Fig. S5f). 
This is qualitatively consistent with earlier observations 
showing that the growth of bulk hexagonal ice is faster 
along its prismatic plane [28]. The preference for cubic 
ice in the early stages of nucleation has been observed in 
previous studies of ice formation in different water mod¬ 
els [27, 29, 30]. However, the molecular origin of this 
preference had not been identified prior to this work. In¬ 
deed, the non-monotonicties in the shape and asphericity 
of the largest crystallite almost disappear when only the 
surviving configurations are considered (Fig. 5e-f). A 
similar correlation exists between the DDC makeup of a 
configuration and its density and ring size distribution 

(Fig. S6). 

Fig. 6 depicts the fate of the cubic-rich crystallites 
that survive the inflection region. Due to the thermo¬ 
dynamic stability of Ih relative to Ic, one expects the 
surviving cubic-rich crystallites to eventually transform 
into Ih. We observe no such transformation during the 
nucleation process, and the crystallites retain their high 
DDC content (Fig. 6a) even after they are post-critical 
(Fig. 6g). (For a discussion of criticality, wee SI and 
Fig. S7b.) This suggests the need for caution in the in¬ 
terpretation of earlier indirect calculations of nucleation 
rate [17] in which the critical nuclei are assumed to be 
exclusively hexagonal. We also observe no tendency for 
the hexagonal polymorph to prefer the core of the crys¬ 
tallite. This is in contrast to the traditional picture of 
nucleation in which the more thermodynamically stable 
phase concentrates at the core, with a shell of the less 
stable phase shielding it from the liquid [31]. Instead, 
we observe a large number of exposed hexagonal cages at 
the surface (Figs. 6b-g), with attrition tendencies simi¬ 
lar to the HCs in the inflection region (e.g. the HC ap- 
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FIG. 4: Nonmonotonicities in average mechanical observables for the configurations obtained from the FFS 
calculation. The insets correspond to the FFS calculation in the LJ system, (a) Radius of gyration (Rg), principal axes 
(oi > 02 > 03 ) and (b) asphericity of the largest crystallite, (c) Ring statistics and (d) density of the system. Nk{X) 
corresponds to the average number of /c-member rings at A, with Nk,i = Nk{Xi). For water, five-, six- and seven-member 
rings are enumerated, while for the LJ system, three-, four- and five-member rings are enumerated. The shaded purple region 
corresponds to the infiection region. All quantities are in dimensionless units for the LJ system. 


pendages in Fig. 6d and the large prismatic-to-basal ratio 
in Fig. S5f). The propensity to grow more cubic stacks 
even after the inflection region is consistent with the pro¬ 
posed mechanism as the addition of new HCs to a large 
crystallite is more likely to lead to chain-like appendages 
at the surface, henceforth making it less stable than an 
equally-sized crystallite grown via the addition of DDCs. 
Indeed, the propensity to form thicker cubic stacks has 
been observed in the growth and consolidation of post- 
critical crystallites in the growth-limited freezing of the 
mW system [27]. 

COMPARISON WITH EXPERIMENTAL RATE 
MEASUREMENTS 

As mentioned above, experimental measurements of 
nucleation rate are only practical over a narrow range of 
thermodynamic conditions. This is because of the fun¬ 
damental limitation of existing experimental techniques, 
which are based on probing the temporal evolution of the 
number of freezing events that are detected in a small 
population of supercooled micro-droplets [32]. There¬ 
fore, the nucleation rates that can be measured from 
the existing experimental techniques can span few or¬ 
ders of magnitude only, confining the range of thermo¬ 
dynamic conditions over which nucleation rates are mea¬ 
surable. For temperatures that are outside this range, 
the nucleation rate is either so small that none of the 


micro-droplets would freeze during the timescale of the 
experiment, or is so large that all droplets would freeze 
immediately. For droplets as small as a few microme¬ 
ters in diameter, nucleation rates have been measured 
for temperatures as low as 234 K, which corresponds to 
a supercooling of 39 K, 3 K smaller than the supercool¬ 
ing considered in this work. (The melting temperature of 
the TIP4P/Ice model is 272.2 K [33] vs. the experimental 
melting temperature of 273 K. Henceforth, our temper¬ 
ature of 230 K corresponds to a supercooling of 42 K.) 
Therefore, no direct comparison can be made between 
our computed nucleation rate and any actual experimen¬ 
tal measurement, without extrapolating to lower temper¬ 
atures. These extrapolations are typically based on clas¬ 
sical nucleation theory, and are prone to large uncertain¬ 
ties, leading to large variations in the extrapolated nucle¬ 
ation rates. In particular, such extrapolations fail to take 
into account the transition to the transport-controlled 
nucleation at low temperatures, which is responsible for 
the appearance of a maximum in the nucleation rate with 
respect to temperature. For real water, the temperature 
of maximum crystallization rate has been estimated to 
be ^ 225 K [34], which is very close to the temperature 
considered in this work. Such extrapolations yield a wide 
range of nucleation rates at a supercooling of 42 K, from 
10i8m-3.s-i in Ref. [12, 35] to 10^^m“^-s“^ in Ref. [36]. 
Another potential source of error, which can lead to a sys¬ 
tematic overestimation of rates at lower temperatures, is 
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FIG. 5: Competition between cubic and hexagonal ice in the inflection region, (a) Double-Diamond, and (b) 
hexagonal cages, (c-d) Number of water molecules in the largest crystallite that participate in (c) a DDC and (d) an HC. 
(e) The longest principal axis and (f) asphericity of the largest crystallite, (g) A pseudo-trajectory that does not survive the 
inflection region. DDC and HC cages shown in blue and red, respectively. Yellow particles belong to both a DDC and an 
HC. Note the abundance of HCs. (h) A pseudo-trajectory that survives the inflection region. Note the abundance of DDCs. 
Molecules that are part of the largest crystallite (based on qo) are shown larger than liquid-like molecules that participate in 
the topological DDC/HC network that encompasses the largest crystallite. 


the possibility of surface-dominated nucleation in smaller 
droplets that are typically used for rate measurements at 
lower temperatures [37]. 

Thanks to the superior temporal resolution of new ex¬ 
perimental techniques, direct measurements of nucleation 
rates at larger supercoolings will be possible in the near 
future. One such technique is the femtosecond X-ray 
laser pulsing that was recently used by Sellberg et al. [24] 
to probe the structural transformation of a population of 
evaporatively cooled micro-droplets of supercooled wa¬ 
ter. Although no nucleation rates are reported in their 
work, it is possible to obtain an approximate estimate 
using Fig. 2 of Ref. [24], which depicts the temporal pro¬ 
files of the temperature and the ice fraction of evapora¬ 
tively cooled 12 yum droplets. The first frozen droplets 
are detected approximately four milliseconds after they 
enter the chamber and when they reach a temperature of 
229 K. The average freezing time of 4 ms can be used to 
obtain an upper bound of Jy ~ 2.7631 x 10^^m“^ • s“^ 
for the homogeneous nucleation rate at the supercool¬ 
ing of 42 K. This is around eleven orders of magnitude 


larger than the rate computed in this work. As we will 
discuss below, however, this discrepancy is reasonable 
considering the sensitivity of the nucleation rate to dif¬ 
ferent thermodynamic features of the system. According 
to the classical nucleation theory, the nucleation rate is 
proportional to expl—AGc/ksT] with AGc, the free en¬ 
ergy barrier associated with the formation of a critical 
nucleus, given by: 


AGe 


167r7^ 

3/)2|A//|2 


( 1 ) 


Here 7 is the solid-liquid surface tension, ps is the number 
density of the solid, and A/i is the free energy difference 
between the crystalline and liquid phases. The exponen¬ 
tial dependence of the nucleation rate on AGc, and the 
sensitivity of AGc to 7 and A/i implies that only a slight 
deviation of any of these quantities from the experimen¬ 
tal value can shift the nucleation rate by several orders of 
magnitude. Both these quantities are difficult to measure 
at large super coolings, mostly because of the difficulty of 
stabilizing supercooled water at such low temperatures. 
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TABLE I: Numerical correlations used for fitting the experimental heat capacity measurements of Refs. [38, 39]. Units are in 
cal • moU^ • K“^. For hexagonal ice, we use a linear fit, while for supercooled water we use a combination of a power law and 
a linear fit. The actual experimental data for supercooled water are for T > 236 K. We thus use the numerical fit provided 
below to extrapolate (7p at 231 K < T < 236 K. 


Phase 

Correlation 

Supercooled Water[39] 
Hexagonal lce[38] 

C'p(T) = 4 X 1015^-5-824 0.7131T - 202 

Cp{T) = 0.032T +0.3252 



FIG. 6: Nucleation beyond the inflection region, (a) 

Average cage participation of the molecules in the largest 
crystallite. The solid black line has a slope of unity. The 
molecules that participate in a DDC (or HC) are included 
in the corresponding count even if they also participate in a 
neighboring cage of the other type. The overwhelming ma¬ 
jority of molecules are at least part of a DDC, while very few 
molecules are only a part of an HC. (b-g) Several representa¬ 
tive configurations obtained at different milestones after the 
inflection region, (b-e) are pre-critical, (f) is critical and (g) 
is post-critical. Molecules that are a part of a DDC, an HC, 
or both are depicted in dark blue, dark red, and light yellow 
respectively. Here, we use the same size convention used in 
Fig. 5. 


Free Energy Difference 

If AiLj, the latent heat of fusion, is not a strong 
function of temperature, A/i can be approximated as 
A/i AHf{Tf — T)/Tf. This approach, which yields 
a value of A/i ^ 0.2215 kcal • moP^ at a supercooling 
of 42 K, is, however, not very accurate for water due to 
its heat capacity anomaly. In order to obtain a more 
accurate estimate, we take the heat capacity measure¬ 
ments for hexagonal ice [38] and supercooled water [39] 
(Table I), and use thermodynamic integration to obtain 


a more accurate estimate of A/iexp = 0.1855 kcal - moP^. 
Similarly we use MD simulations in the NpT ensemble 
to compute enthalpies of hexagonal ice and supercooled 
water at 230 K< T < 272 K and utilize those enthalpies 
to compute A/i using thermodynamic integration. We 
obtain a value of A/iTiP 4 p/ice = 0.147 kcal-moP^ for the 
TIP4P/Ice system, which is around 20 per cent smaller 
than A/iexp- This discrepancy alone can lead to an over¬ 
estimation of the nucleation barrier by as much as 60 per 
cent if everything else is identical. To be more quantita¬ 
tive, the classical nucleation theory predicts a nucleation 
barrier of AGc = ^\Ap\Nc ~ blksT for the TIP4P/Ice 
system at T = 230 K. However, if we use A/iexp instead 
of A/UTIP4P/Ice, and Ps.exp = 0.922 g • cm“^ (Ref. [40]) 
instead of /9s_TiP4P/ice = 0.908 g ■ cni“^ (obtained from 
NpT MD simulation of Ih at 230 K and 1 bar) in (1), 
we obtain a barrier of 31k bT^ which corresponds to 
an increase in the nucleation rate by 8-9 orders of magni¬ 
tude. This is very close to the discrepancy between our 
calculation and the experimental estimates of rate. 


Surface Tension 

At temperatures below Tj, the supercooled water that 
is in contact with ice is not stable and will immediately 
freeze. This makes experimental measurements of 7 in 
the supercooled regime extremely challenging. There¬ 
fore, 7 is typically estimated indirectly from the nucle¬ 
ation data assuming the validity of the classical nucle¬ 
ation theory. Consequently, there is a large variation in 
the reported estimates of 7 for supercooled water that 
span between 25 and 35 mJ-m“^ (Ref. [41]). Similarly, it 
is very challenging to compute 7 directly from molecular 
simulations at T < Tj, and all the existing estimates are 
obtained from nucleation calculations [13, 17, 42]. The 
existing direct calculations have all been performed at 
coexistence conditions [43, 44]. The computed numbers 
cover even a wider range from 20.4 mJ-m“^ in Ref. [17] to 
35 mJ-m“^ in Ref. [42]. This large variability underscores 
the sensitivity of the computed value to the particulars 
of the water model, and to the thermodynamic condi¬ 
tions at which the calculation has been made. In light of 
the mechanism that is proposed for freezing in this work, 
the problem of determining 7 is further compounded by 
the stacking disorder nature of the critical nucleus. Con- 
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sidering the cubic dependence of the nucleation barrier 
on 7 , even the slightest deviation from the experimental 
value can shift the nucleation rate by several orders of 
magnitude. For instance, a seven percent deviation can 
change the nucleation barrier by as much as 22 percent 
which can shift the nucleation rate by several orders of 
magnitude. 

Overall, the existing classical models of water in¬ 
evitably predict certain thermodynamic properties of wa¬ 
ter to be at variance with experiments by a significant 
margin, and a model that predicts all thermodynamic 
properties accurately is yet to be developed [45]. There¬ 
fore, the agreement between the orders of magnitude of 
the computational estimate of the nucleation rate in a 
classical model of water, like TIP4P/Ice, and the corre¬ 
sponding experimental value is difficult to achieve with 
the existing models due to strong sensitivity of the nu¬ 
cleation rate to the particular thermodynamic features of 
the employed water model (e.g. the free-energies of the 
liquid and the solid, and the liquid-solid surface tension). 

Earlier Computational Studies of Nucleation Rate 

It is inherently problematic to compare computational 
estimates of nucleation rate obtained for different force- 
fields using different methodologies. This is not only be¬ 
cause of the large uncertainties associated with the uti¬ 
lized methods (e.g. the validity of classical nucleation the¬ 
ory in the seeding technique), but also due to the empir¬ 
ical and approximate nature of the utilized force-fields, 
which, as shown in the previous section, can shift the 
computed rates by several orders of magnitude. This 
difficulty becomes apparent upon observing the spread 
of the reported computational estimates of the homoge¬ 
neous ice nucleation rate in the literature. Li et al [13, 14] 
and Haji-Akbari et al [15] used forward-flux sampling to 
compute the nucleation rate in the mW system over a 
wide range of temperatures. Their computed rates are 
lower than the corresponding experimental values, but 
are yet a few orders of magnitude higher than the rate 
computed in this work. This can not only be attributed 
to the inherently faster dynamics of the mW system, but 
also to the higher |A/i| of the mW model at deep super- 
coolings, as depicted in Fig. 2a of Ref. [46]. Recently, 
Sanz et al. [17, 46] used the seeding technique to com¬ 
pute the nucleation rates for several water models. This 
interesting approach assumes the validity of the classical 
nucleation theory and the precise crystallographic nature 
of the critical nucleus (e.g. hexagonal ice). The estimated 
uncertainties associated with these- and other- assump¬ 
tions are very large (e.g. error bars in Fig. 7 of Ref. [17]). 
But nevertheless, a comparison at the same reduced con¬ 
ditions suggests that the present rates are lower than the 
those estimated by Sanz et al. In particular, the rates 
reported at Refs. [17, 46] are very sensitive to the a pri¬ 


ori definition of what constitutes a nucleus, as the size 
of the critical nucleus is directly used for estimating the 
nucleation barriers and the nucleation rates. Our ap¬ 
proach, however, does not rely on determining the size of 
the critical nucleus as a prerequisite for computing the 
nucleation rate. 

CONCLUSIONS 

In this work, we establish the feasibility of comput¬ 
ing the rate of homogeneous ice nucleation for realis¬ 
tic molecular models of water. This is significant con¬ 
sidering the difficulties associated with measuring nucle¬ 
ation rates in experiments. However, the computed rates 
for the TIP4P/Ice model are several orders of magni¬ 
tude smaller than the experimental estimates at compa¬ 
rable conditions. This discrepancy is attributed to the 
smaller thermodynamic driving force for the freezing of 
the TIP4P/Ice system relative to experiment. Neverthe¬ 
less, the ability to directly compute rates for a molecular 
model makes it possible, in principle, to parameterize 
molecular force-fields with an eye towards accurate pre¬ 
diction of nucleation rates. In addition, this paves the 
way for studying the kinetics and mechanism of ice nu¬ 
cleation across a wide range of environments, such as 
the atmospherically relevant films, droplets and aerosols. 
Finally, the coarse-grained FFS utilized in this work 
can prove useful in studying disorder-order transitions 
in other slowly-relaxing systems, such as water/gas mix¬ 
tures, ionic liquids, and macromolecular and biomolecu- 
lar systems. In addition to being able to compute nu¬ 
cleation rates, we obtain valuable mechanistic informa¬ 
tion that is not attainable in experiments. In particular, 
we provide a molecular explanation for the initial for¬ 
mation of cubic-rich ice in homogeneous nucleation of 
supercooled water. 

METHODS 

Individual MD simulations are performed using 
LAMMPS [47]. The size of the largest crystalline nucleus 
is chosen as the order parameter. The largest crystallites 
are detected using the Steinhardt qq order parameter [22] 
and the chain exclusion algorithm of Reinhardt et al. [48] . 
Technical specifications of the MD simulations and the 
order parameter can be found in the SI. Rings are de¬ 
tected using the King criteria [49], while DDCs and HCs 
are identified using a novel topological approach, with the 
detection algorithms thoroughly mentioned in the SI. 

Rate calculations are performed applying a novel 
coarse-graining to the forward flux sampling algo¬ 
rithm [21]. The FFS technique is based on sampling the 
nucleation process in stages, by staging milestones be¬ 
tween the liquid and crystalline basins. (See SI for fur- 


ther explanation.) The essence of FFS is thus to identify 
first passage events between the absorbing milestones of 
each iteration. In principle, this should be done by mon¬ 
itoring all the time-continuous trajectories originating at 
any given milestone and by determining the exact times 
at which they cross any of the two absorbing milestones. 
In reality, however, these time-continuous trajectories are 
approximated by solving the discretized versions of the 
equations of motion. As a result, the order parameter 
can only be computed as frequently as every single MD 
step, and any crossings that might occur at intermedi¬ 
ate times will be inevitably ignored. Historically, this 
has been the approach taken in all reported applications 
of the FFS algorithm, with some authors using larger 
sampling times (up to a few MD steps) only out of con¬ 
venience [50]. In the context of crystallization, however, 
it is reasonable to argue that fluctuations in the order pa¬ 
rameter are only meaningful if they occur at time scales 
that are not significantly smaller than the structural re¬ 
laxation time, r^, or the hydrogen-bond relaxation time, 
T/i. One can therefore coarse-grain the FFS algorithm 
by using a larger sampling time, and ignoring any high- 
frequency oscillations in the order parameter that occur 
at intermediate times. In order to test the validity of this 
argument, we carry out a series of FFS calculations of the 
rate of homogeneous crystal nucleation in the LJ system, 
with sampling times spanning over four orders of magni¬ 
tude. We confirm that the cumulative probabilities and 
nucleation rates are virtually insensitive to the selection 
of Ts unless Ts/t^ > 10“^ (Fig- 3). This approach, how¬ 
ever, leads to considerable errors when ^ as the sys¬ 
tem starts losing some of its memory between successive 
samplings of the trajectory. No loss of physically relevant 
information occurs when as the fluctuations of 

the order parameter at times smaller than are not 
representative of physically relevant structural transfor¬ 
mations. However, these high-frequency fluctuations can 
become an issue at extremely small sampling times, as in 
the TIP4P/Ice system. Choosing a large sampling time 
is, thus, conceptually similar to applying a low-pass fil¬ 
ter to the order parameter time series. In the TIP4P/Ice 
system, we choose a sampling time of 1 ps, which is 2- 
3 orders of magnitude smaller than both = 0.6 ns 
(Fig. 2a) and Th = 4.0 ns (Fig. 2c). By doing this, we 
manage to turn an otherwise diverging unsuccessful FFS 
calculation (Fig. S2a) into a converging successful cal¬ 
culation presented in Fig. 1. Further technical details 
about the method, as well the computational cost of the 
calculations are included in the SI. 
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SUPPLEMENTARY INFORMATION 

System Preparation and Molecular Dynamics 
Simulations 

All simulations are carried out in cubic boxes with peri¬ 
odic boundaries. In this work, we study crystallization in 
three different systems: (i) the TIP4P/Ice [33] system as 
one of the best non-polarizable molecular representations 
of water, (ii) the mW system [16] as one of the most popu¬ 
lar and widely used coarse-grained representations of wa¬ 
ter, and (iii) the Lennard-Jones system [51] as a prototyp¬ 
ical simple liquid. In every system, we choose the thermo¬ 
dynamic state points so that the critical nucleus contains 
between 300 and 500 molecules. Consequently, a system 
of 4,096 molecules is sufficient for studying crystalliza¬ 
tion in all these systems. We choose TIP4P/Ice over 
the closely-related- and widely-used- TIP4P/2005 [52] 
model because of its more realistic melting temperature, 
giving rise to slightly higher diffusivities at identical su- 
percoolings. Furthermore, the TIP4P/Ice model provides 
the most quantitatively accurate prediction of the coexis¬ 
tence lines between different ice polymorphs. We prepare 
our starting configurations from a dilute simple cubic lat¬ 
tice, and gradually compress it to the target temperature 
and pressure using NpT MD simulations. We then equili¬ 
brate the system at the target thermodynamic conditions 
for a minimum of lO^r^ with the structural relaxation 
time for each system. The starting configurations of the 
cubic and hexagonal ice are prepared using the unit cells 
proposed by Lekner [53] and Hayward and Reimers [54], 
respectively. 

All Molecular Dynamics (MD) [55] simulations are 
performed in the isothermal-isobaric (NpT) ensemble 
using LAMMPS [47]. We integrate Newton’s equa¬ 
tions of motion using the velocity Verlet algorithm [56]. 
Temperature and pressure are controlled using a Nose- 
Hoover thermostat [57, 58] and a Parrinello-Rahman 
barostat [59]. In the TIP4P/Ice system, long-range elec¬ 
trostatic interactions are computed using the particle- 
particle particle-mesh algorithm [60] with a short-range 
cutoff of 8.5 A. Also, the rigidity of water molecules is 
enforced using the SHAKE algorithm [61]. Table SI gives 
the technical specifications of the MD trajectories for the 
three systems studied in this work. 


Then, a graph is constructed by connecting all solid-like 
neighbors that are within the first nearest-neighbor shell 
of one another. The number of molecules in the largest 
connected sub-domain of this graph is used as the order 
parameter in our FFS calculations. Below we provide the 
particular criteria used for identifying solid-like molecules 
in each system. 


TlP^P/Ice and mW 

A distance cutoff of Tc = 3.2 A is used for defining the 
first nearest neighbor shell. In the TIP4P/Ice system, 
Vc corresponds to the location of the first minimum of 
^oo(^): the oxygen-oxygen radial distribution function, 
of the disordered, cubic and hexagonal phases (Fig. Sla). 
Therefore, the distance between two molecules is defined 
as the distance between their constituent oxygen atoms. 
In the mW system, Vc corresponds to the first minimum 
of g{r) [13]. The qQ{i) order parameter is then computed 
for molecule i as: 


. .s ^ 1 q6(^) -ggp') 

^ mi) Iq6«l|q6(i)l 

with qg(i) = (g6,-6(0)96,-5(0r-- ,96,6(0) given by; 

^ mi) 

q^miS) — AT ( '\ ^ ^ 1 4^ij^ 1 6 ^ TTl ^ 6 

h=l 
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Here Ni){i) is the number of molecules that are within 
the nearest neighbor shell of the ffh molecule, Oij and 
(pij are the spherical angles associated with the displace¬ 
ment vector between ith and jth molecules, and Yim{0, (j)) 
is the /m-th spherical harmonic. As depicted in Fig. Sib, 
a cutoff value of g6,c = 0-3 is suitable for distinguishing 
solid- and liquid-like molecules. In order to eliminate 
chains of locally tetrahedral water molecules, we apply 
the recently developed algorithm of Reinhardt et al [48] . 
Such chains are abundant in supercooled water, and the 
failure in removing them can lead to the detection of 
non-compact crystallites. As shown in Ref. [48], the ap¬ 
plication of this chain exclusion step is pivotal in driving 
crystallization in molecular models of water even when a 
biasing potential is utilized. 


Order Parameter 

We quantify the progress of crystallization using the 
two-step process explained in detail in our earlier publi¬ 
cation [15]. First, every molecule in the system is labelled 
as solid-like or liquid-like based on its local environment, 
with the selection criterion being different from system 
to system (see below). In this work, we use spherical har¬ 
monics to distinguish liquid- and solid-like molecules [22] . 


Hydrogen Bonding and Distance Cutoffs 

Using a distance criterion for identifying solid- and 
liquid-like molecules in supercooled water is a common 
practice in the literature [17, 30]. Alternatively, one 
can use the hydrogen bond network [20]. In deeply su¬ 
percooled water, however, these two approaches are al¬ 
most equivalent as the molecules that are within the first 
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neighbor shell of one another are almost always hydrogen- 
bonded [62]. In order to quantify this, we identify all the 
hydrogen bonds in the system. A distance-based bond 
is a hydrogen bond if the distance between the poten¬ 
tial donor oxygen and the acceptor hydrogen is less than 
2.42 A and the angle between the O—H and the O- • • O 
vectors is less than 30 degrees [63] . We observe that more 
than 98 per cent of distance-based bonds in the entire 
system are hydrogen bonds. This corresponds to a false 
positive of less than one bond per every 20 molecules. For 
the distance-based bonds emanating from the molecules 
that are part of the largest crystallite, this fraction is 
> 99.9 per cent. We are therefore confident that our 
distance-based criterion that is used not only in comput¬ 
ing the order parameter, but also in identifying rings, 
DDCs and HCs, is robust, and our main conclusions will 
be unchanged if the hydrogen bond network is used in¬ 
stead. 


crossing Abasin and is calculated by analysing a long MD 
trajectory in the liquid basin. The configurations that 
correspond to such crossings are stored for future itera¬ 
tions. The next step is to compute {P(A/c+i the 

transition probabilities. In order to compute P(A/c+i | A/^), 
a large number of MD trajectories are initiated from the 
configurations gathered at by randomly picking a con¬ 
figuration one at a time, and randomising its momenta 
according to the Boltzmann distribution. P(A/e+i|A/c) is 
the fraction of those trajectories that cross A/^+i prior to 
crossing Abasin in the opposite direction, and the config¬ 
urations that correspond to such crossings are stored for 
future iterations. This procedure is repeated until a value 
of A AT is reached for which P{XN-\-i\X]sf) = I for every 
Aiv+i > Aiv- The error bars in the flux and the transition 
probabilities are estimated using the procedure outlined 
in Ref. [21]. The rationale behind the coarse-graining of 
FFS is explained in the Methods section in the main text. 


Lennard-Jones 


Technical Specifications of FFS in the TIFfP/Ice System 


We use a distance cutoff of Tc = I.40cr as suggested 
in Ref. [64]. We first compute q6(i) for each molecule i 
using Eq. (3). We then carry out an additional level of 
neighbor averaging as proposed in Ref. [64]: 

Mb) 

with j = 0 corresponding to the ith particle itself. The 
scalar qq order parameter is defined as: 





m =—6 


( 5 ) 


Based on Fig. 1(b) of Ref. [64] we use a cutoff ^ = 0.3 
to distinguish solid- and liquid-like LJ atoms. 


Forward-flux Sampling 

The forward-flux sampling (FFS) algorithm [21] is 
based on partitioning the configuration space into non¬ 
overlapping regions separated by milestones that are iso¬ 
surfaces of the order parameter [15]. The first milestone, 
denoted by Abasin, is chosen in the middle of the liq¬ 
uid basin and is therefore frequently crossed by a typ¬ 
ical MD trajectory of the liquid. The other milestones 
Abasin = Ao < Ai < A 2 < • • • are chosen so that Xk 
is accessible to the trajectories initiated at Xk-i with a 
sufficiently large probability. We have outlined the cri¬ 
teria used for placing Abasin and A^’s in our earlier pub¬ 
lication [15]. The nucleation rate is then computed as 
R = 4>o n^i ^(^/c+i|A/c). Here, 4>o is number of trajec¬ 
tories per unit volume per unit time that cross Ai after 


We perform our FFS calculations using an in-house 
C++ program discussed in detail elsewhere [15]. This 
computer program links against the LAMMPS static li¬ 
brary and uses it as its internal MD engine. The basin 
simulations are carried out for 694 ns, and the first two 
milestones are placed at Abasin = 5 and Ai = 10. We start 
the second stage of the algorithm after gathering 1,685 
configurations at Ai = 10. During the FFS iterations in 
the inflection region, we demand a minimum of ~ 1,000 
crossings at every milestone. After the inflection region, 
however, we terminate each iteration after 300 — 350 
crossings. Fig. S7a depicts the average success and fail¬ 
ure times for the trajectories that are initiated at each 
milestone. These are the average times that it takes for 
a trajectory to cross the next milestone and to return to 
the liquid basin, respectively. Note that the average suc¬ 
cess and failure times are very large at the latest stages 
of the FFS algorithm, of the order of tens to hundreds of 
nanoseconds. This is consistent with earlier observations 
by Reinhardt et al [48] that in molecular models of water, 
the critical nucleus (as obtained from umbrella sampling 
simulations in their work) might not grow or shrink even 
after tens of nanoseconds of regular MD. 


Critical Nucleus Size 

The systematic way of determining the size of the crit¬ 
ical nucleus is to perform committor analysis [65]. The 
committor probability of any given configuration is the 
probability that a random MD trajectory initiated from 
that configuration crystallizes before returning to the liq¬ 
uid basin. For the critical nucleus, the committor prob¬ 
ability is exactly one half as a critical nucleus is equally 
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likely to melt or to grow. Therefore, the critical nucleus 
size is the average size of the largest crystallites in the 
ensemble of all critical configurations. The problem of 
determining committor probabilities for a family of ’sus- 
pected‘ critical nuclei is, however, computationally ex¬ 
pensive and can take up to several months even on a 
large supercomputer. Since the notion of criticality only 
plays a descriptive role in our discussion, we use an al¬ 
ternative approach by assuming that the order parameter 
utilized in our FFS calculations is a good reaction coor¬ 
dinate. Therefore, the committor probability can be ap¬ 
proximated as pci^k) = with be¬ 

ing the final milestone of the FFS calculation. Fig. S7b 
gives pcW vs. A for the FFS calculation of the nucle- 
ation rate in the TIP4P/Ice system. The critical nucleus 
has around 320 ± 20 water molecules. 


Computational Cost 

As can be seen in Fig. S7a, the average failure and 
success times are very large at the final iterations of our 
FFS calculations. As a result, we carried out a total 
of 608 yus of MD trajectories. This amounts to a total 
of 21,452,433 CPU-hours on the Texas-based Stampede 
supercomputer. Due to the embarrassingly parallel na¬ 
ture of the FFS algorithm, we were able to distribute this 
very costly calculation across the following supercomput¬ 
ers: the Princeton-based Della and Tiger supercomput¬ 
ers, the TACC-based[? ] Stampede supercomputer, the 
SDSC-based[? ] Gordon supercomputer, and the RPI- 
based[? ] Blue Gene/Q supercomputer. 

Shape of the Largest Crystallite 

We use {ri}fLi^ the positions of the oxygen atoms 
in the largest crystallite to compute its gyration ten¬ 
sor g := (1/A^) - i*cm][d - tcm]^ with vcm = 

S three real eigenvalues > 

a^. The asphericity, k is computed as: 

2 _ 3 gf -h 0^2 + 0^3 _ 1 . . 

2 {a‘1 + 0^2 + <^3)^ 2 

Ring Statistics 

The first step in identifying rings is to define a near¬ 
est neighbor network. For this purpose, we use the same 
distance criterion utilized in the definition of the Qq or¬ 
der parameter. As explained earlier, over 98 per cent of 
nearest neighbor pairs are indeed hydrogen bonded. We 
then use the criteria proposed by King [49] to identify 
all the primitive rings in the system, and get rid of rings 
that share more than three consecutive water molecules. 


In the TIP4P/Ice and mW systems, we identify rings of 
up to eight molecules, while in the Lennard-Jones sys¬ 
tem, we confine ourselves to rings of six atoms or less. 
The hexagonal rings detected in the TIP4P/Ice and mW 
systems are then used for detecting double-diamond and 
hexagonal cages. 

Double-diamond Cages 

We loop through all the hexagonal rings in the system 
and use the topological features of a DDC (Fig. S5a) 
to detect double-diamond cages as follows. A hexagonal 
ring, R = (mi, m2, • • • with rrik denoting the kth 

molecule that is part of the ring- is the equatorial ring 
of a DDC if the following conditions are satisfied: 

1. For every 1 < /c < 6, a minimum of three other 
hexagonal rings pass through rrik- 

2. For every triplet = (m*,there 
is at least one hexagonal ring other than Rq that 
passes through rrik^ m/c0i and mk^ 2 - Here a 0 6 = 
(a 0 b) mod 6. 

3. If {Rk,j}^Li the hexagonal rings other than Rq 

that pass through T/^, there must exist ji,^2, * * * ,4*6 
so that n R 3 J 3 n R5 J5 0 and ^2,^2 ^ ^4,0 ^ 
Reje 7^ 0- Also Riji flRsjg, R3J3 etc must 

all have three molecules. 


Hexagonal Cages 

The two hexagonal rings Ri = " Je) and R 2 = 

(mi, m2, • • • , me) can be the basal planes of a hexagonal 
cage (Fig. S5b) if the following conditions are satisfied: 

1. Ri n R2 = 0 . 

2. There exists 1 < k < 6 so that rrik is a neighbor of 
li or l 2 ^ as defined based on the distance criterion. 

3. If rrik is a neighbor of /i, mk ®2 and rrik^A must be 
neighbors of I 3 and I 3 (or I 3 and l^)^ respectively. 
Adjusting the algorithm to the case of rrik being a 
neighbor of I 2 is straightforward. 


DDC/HC Networks 

The cages that share a minimum of one water molecule 
are clustered together to form a network of intercon¬ 
nected DDC/HC cages. Due to the topological crite¬ 
rion used in the identification of their constituent cages, 
such networks can have water molecules that might be 
detected as ’liquid-hke‘ when the qq order parameter is 
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used. This explains the presence of liquid-like molecules 
in the DDC/HC networks shown in Figs. 3 and 4. 


Basal and Prismatic Connection 

We determine the type of connection between two 
neighboring hexagonal cages based on the number of wa¬ 
ter molecules that they share. As can be seen in Fig. S5d, 
two hexagonal cages that are connected through their 
basal plane have six water molecules in common, while 
two cages that are connected through their prismatic 
planes (Fig. S5e) have four molecules in common. 


both systems are performed in the NpT ensemble. Each 
such simulation is comprised of an initial equilibration 
period of one nanosecond followed by a five-nanosecond 
production run. The time averages of enthalpies are com¬ 
puted for 230 K < T < 272 K during the production runs 
and are then used for computing the free energy differ¬ 
ence using the following equation: 

Mhex(T) — /iliq(T) _ hhex(T) — 

T “ Jr ? 

Here, hx corresponds to the molar enthalpy of X= liq, 
hex. The findings of these free energy calculations are 
given in the SL 


(7) 


Thermodynamic Integration 

For computing the free energy difference between su¬ 
percooled water and hexagonal ice, MD simulations of 
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TABLE SI: Technical specifications of the MD simulations and the order parameter. For the LJ system, all 
quantitates are in the LJ dimensionless units. 



TIP4P/lce 

mW 

Lennard-Jones 

time step 

2 fs 

2 fs 

0.00002-0.0025 

thermostat time constant 

200 fs 

200 fs 

0.25 

barostat time constant 

2 ps 

2 ps 

2.5 

Distance cutoff, rc 

3.2 A 

3.2 A 

1.40 

Type of qg 

Regular 

Regular 

neighbor-averaged 

qe.c 

0.5 

0.5 

0.3 



FIG. SI: Calibration of the order parameter: (a) Oxygen-oxygen radial distribution function and (b) the distribution of 
the qe order parameter for the cubic and hexagonal polymorphs of ice, and for the supercooled liquid, computed from a 20-ns 
NpT MD simulation of the TIP4P/Ice system at 230 K and 1 bar. The distance and qe cutoffs, rc = 3.2 A and qe^c = 0.5 are 
both marked with dark dashed lines. 




FIG. S2: The failure of the conventional FFS approach in the TIP4P/Ice system at 230 K and 1 bar. All 

symbols are obtained from actual simulations, while the dashed lines are schematic representations of what would happen upon 
performing more FFS iterations, (a) P(A|Ai) vs. A does not have the positive curvature observed in successful FFS calculations 
presented in the main Figs. 1, S3a and S4a. (b) Average failure times for trajectories aimed at A. Beyond A ~ 30, this average 
failure time plateaus. This suggests that the addition of new water molecules to the largest crystallites is only nominal and 
does not lead to a meaningful improvement in the overall structural quality of the arising configurations. We observe a strong 
correlation between the plateauing of the average failure time and the failure of the corresponding FFS calculation, and based 
on this heuristic, we terminate the calculation depicted in (a) at A 40. Gontrast this to the strictly increasing average failure 
time in the successful FFS calculation in the mW system. 
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FIG. S3: Crystallization of the LJ system close to the triple point. FFS calculations are performed at ksTje = 0.48 
and pa^ je = 0. (a) No inflection is observed in the cumulative probability curve. Furthermore, (b) the dimensions and (c) 
the asphericity of the largest crystallite, (d) the number of three-, four- and hve-member rings and (e) the density of the 
system change monotonically between the liquid and the crystal. The observed lack of inflection and non-monotonicity in the 
calculations presented here reveal that the trends presented in the insets of Figs.l and 4 are also observed in low-pressure LJ 
systems. 
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FIG. S4: Ice nucleation in the mW system at 230 K and 1 bar. (a) Cumulative probability, (b) cage participation, (c) 
shape and (d) asphericity of the largest crystallite, (e) number of hve-, six- and seven-member rings and (f) density as a function 
of the size of the largest crystallite. Note that the inflection in cumulative probability, and the associated non-monotonicities 
in density, asphericity and ring statistics are very mild in the mW system, and no monotonicity exists in the dimensions and 
the radius of gyration of the largest crystallite. 
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FIG. S5: Topological features and growth characteristics of different cages, (a) Topological features of a double¬ 
diamond cage. Every DDC has one equatorial ring Rq, and six peripheral rings Ri,--- ^Re- Every water molecule in Ro 
participates in four hexagonal rings. Eor instance, 5 participates in R 3 ,R 4 and R^ in addition to Rq. Every triplet along 
is crossed by exactly one other ring in the DDC. Eor instance, the triplet (1,2,3) is crossed by Ri. The three top peripheral 
rings, i^i, R 3 and i^ 5 , and the three bottom peripheral rings, i^ 2 , Ra and Rq^ each have one water molecule in common, namely 
10 and 14, respectively, (b) Topological features of a hexagonal cage. Ri and R 2 are the basal planes of the cage, while R^^Ra 
and R 3 are the prismatic planes. These are not real two-dimensional planes due to their bending as a result of tetrahedral 
arrangement of hydrogen bonds, (c-e) Schematic representation of the available pathways for the formation of new DDCs and 
HCs. (c) Each DDC has six identical six-member rings that can act as anchoring points for new DDCs or HCs. (d-e) Each 
HC has two distinct sets of six-member rings as anchoring points for new cages. The basal plane (d) of a hexagonal cage can 
support the attrition of both HCs and DDCs. The prismatic plane of an HC (e), however, only supports the attrition of new 
HCs. There are far fewer basal connections in the system as depicted in (f). 
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EIG. S6: Non-monotonicities in ring statistics and density. Distribution of ring populations (a-c) and densities (d) 
in conhgurations that are rich in DDCs (blue) and rich in HCs (red). In each panel, p is the probability that these distinct 
distributions are statistically indistinguishable, and is computed from student’s t-test analysis. In order to better visualize these 
distributions, a Gaussian with the same mean and standard deviation is plotted for every distribution. DDC- and HC-rich 
conhgurations are distinguished using the k-mean clustering algorithm. 
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FIG. S7: Computational cost and the approximate commitor probability (a) Average success and failure times for 
the trajectories initiated at different iterations of our FFS calculation in the TIP4P/Ice system, (b) pc (A) vs. A for the FFS 
calculation of the nucleation rate in the TIP4P/Ice system. The critical nucleus has 320 ± 20 water molecules. 











